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Abstract 

We consider a random fitness landscape on the space of haploid diallelic genotypes with n 
genetic loci, where each genotype is considered either inviable or viable depending on whether 
or not there are any incompatibilities among its allele pairs. We suppose that each allele 
pair in the set of all possible allele pairs on the n loci is independently incompatible with 
probability p = c/(2n). We examine the connectivity of the viable genotypes under single 
locus mutations and show that, for < c < 1, the number of clusters of viable genotypes 
in this landscape converges weakly (in n) to N = 2* where "J is Poisson distributed; while 
for c > 1, there are no viable genotypes with probability converging to one. The genotype 
space is equivalent to the hypercube Q n and the viable genotypes are solutions to a random 
2-SAT problem, so the same result holds for the connectivity of solutions in Q n to a random 
2-SAT problem. 



1 Introduction 

The space of diallelic haploid genotypes on n genetic loci is in bijection with the set of binary 
strings of length n or the vertices of the n-cube Q n . Let [n] = {1, . . . ,n} be the set of genetic 
loci. At each locus i, let the two alleles be denoted 0j and lj. We define a pair of alleles (x, y) on 
distinct loci to be incompatible if the existence of these alleles in a genotype is lethal. Suppose 
we are given a list of incompatibilities L. We say that a genotype is viable if none of its allele 
pairs is on the list L. 

This model is an example of a fitness landscape. The notion of fitness landscapes was 
introduced by a theoretical evolutionary biologist, Sewall Wright in [15] (see also [13 [9]). The 
study of fitness landscapes has proved extremely useful both in biology and well outside of it. In 
the standard interpretation, a fitness landscape is a relationship between a set of genes (or a set 
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of quantitative characters) and a measure of fitness (e.g. viability, fertility, or mating success). 
In Wright's original formulation the set of genes (or quantitative characters) is the property of 
an individual. However, the notion of fitness landscapes can be generalized to the level of a 
mating pair, or even a population of individuals. For a comprehensive introduction to fitness 
landscapes see [9]; or for descriptions more closely aligned with the ideas here, see [12] or [ID] . 

This model is easily translated into propositional logic. To each locus 1 < i < n we associate 
a boolean variable £j. The two truth values of are referred to as the positive literal and the 
negative literal of £j. We make the convention that the allele Oj is the negative literal and \% 
is the positive literal. To identify a literal in this model we need both locus and value, but we 
put the locus in the subscript. The negation of any literal x is denoted x. Thus, 0« = lj and 
lj = Oj. Alleles at distinct loci or literals of distinct variables are said to be strictly distinct. 
An assignment of n strictly distinct literals to n boolean variables is referred to as a truth 
assignment to the n variables. Thus, each genotype is equivalent to a truth assignment. 

If (x, y) is an incompatibility, then a genotype that has allele x must also have allele y to 
be viable, and a genotype with allele y must have x. Thus, there are two implications naturally 
associated to the incompatibility (x,y); x =4> y and y =^ x. It follows that the incompatibility 
(x, y) is equivalent to the 2-clause or disjunction x V y. Let ci, . . . , c m be 2-clauses with literals 
chosen from the n variables. Then 

F = a A . . . A c m 

is a 2-formula. If there is some literal assignment to the n variables for which the formula F is 
TRUE, then it is said to be satisfied or SAT. Determining if a formula is satisfiable is known as 
the 2-SAT problem. 

Given the equivalence of 2-clauses and incompatibilities, the set of lists of m incompatibilities 
and the set of 2-formulae of length m are in bijection. Moreover, if L is a list of incompatibilities 
and F the corresponding 2-formula, then a genotype having no allele pairs on L is equivalent 
to the associated truth assignment satisfying the 2-formula F. Thus, every viable genotype 
determined by L is a solution to the 2-SAT problem corresponding to F and vice versa. For 
the remainder of this paper, we let F refer to either the formula or the list of incompatibilities 
without distinction. 

We are interested in the connectivity of viable genotypes through single locus mutations. 
If two viable genotypes are connected by a path of single locus mutations we say that they 
are in the same cluster. Thus, each cluster of viable genotypes corresponds to a collection of 
genotypes which can each evolve into any other genotype in the cluster without passing through 
an inviable genotype and without altering two or more alleles simultaneously. We let the edges of 
Q n represent single locus mutations, so the clusters of viable genotypes determined by a formula 
F correspond to maximal connected components in the subgraph, Qf = Qp C Q n . 

A random subgraph can be chosen by supposing that each pair of alleles is incompatible 
independently with probability p. We denote the resulting set of incompatibilities by F p = F™ 
and let Qf p = Qf p ^ e the random subgraph that is induced by the viable genotypes. Since it 
will be clear from context whether or not the set of incompatibilities is randomly determined, 
we simply denote this set by F (or F n ). 
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The main result of this paper concerns the random variable N n which counts the number of 
clusters of viable genotypes in the random subgraph Qp. 

Theorem 1. Let F, Qp and N n be randomly determined as above. If p = c/(2n) and c < 1, 
then N n converges weakly to N = 2* where ^ is a Poisson random variable with mean 

A = -I(ln(l-c)+c). 

In particular, the probability that there is a unique cluster converges to 

e - x = - c )e c . 

Determining the probability that there are any viable genotypes in this model is the ran- 
dom 2-satisfiability or random 2-SAT problem where each possible 2-clause is included with 
probability p. The results on random 2-SAT found in [7] [TT] [8] [5] [6] [12] imply the following 
statements with probability approaching one as n approaches infinity. 

Let p = c/(2n), where c > is constant. Then, if c > 1, F allows no viable genotypes; and 
if c < 1, then the number of viable genotypes in Qp increases exponentially with n. 

We should also point out that the theorem here may be related to the clustering results 
concerning solutions to random 2-SAT found in [3] utilizing the replica method. 

2 The digraph 

Every formula F can be associated to a digraph Dp as defined by Aspvall et al pp. The vertex 
set of Dp is the set of alleles 

{Oi, . . . ,0 n , lx, . . . , In}- 

For each incompatibility (x, y) € F the directed edges x — > y and y — > x are included in Dp, one 
edge for each implication. If there is a directed path from x to y in Dp it is denoted by x y 
and this path is in Dp if and only if the path y x is in Dp. 

The set of alleles and edges that can be reached from x is referred to as the out-graph Dp~(x) 
and the allele set of Dp is denoted 

L + (x) = L%(x) = {y : x ~> y}. 

Since edges are equivalent to implications, these are the alleles that must be present in any 
viable genotype with allele x. Of course, a genotype may have all the alleles in L + (x) and yet 
still be inviable due to an unrelated incompatibility. 

We are also interested in those alleles that require x for their own existence in a viable 
genotype; Dp(x) denotes the in-graph of x. The allele set of the in-graph of x is denoted 

L~(x) = Lp(x) = {y : y ~> x}. 
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Suppose that y G Lp{x) n L F (x) , so x ~^ y and y x. In this case x and y are said to be 
strongly connected. The set of alleles 

Cf{x) = {y : x ~^ y x} 

that are strongly connected to x is referred to as the strong component of x. We follow the 
convention that x ~^ x for every vertex x G Dp. Consequently, every x is a member of a strong 
component and Cf{x) is well defined for every allele. Thus, the set of all 2n alleles can be 
partitioned into strong components of Dp and a partial order can be defined on the set of strong 
components. Let 

C F {x) < C F {y) if x ~> y, 
which also implies that x' y' for any x' G L F (x) 3 Cf{x) and y' G Lp{y) => Cf{v)- 

We think of the alleles of a strong component as depending upon each other for survival; 
a viable genotype either has all or none of the alleles within a strong component. If a strong 
component is of size one, i.e., Cf(x) = {x}, then this is an empty statement and we say that 
the strong component of x is trivial. Nontrivial strong components represent genetic groupings 
and the clustering behavior of Qp is determined by these groupings, so we are quite interested 
in strong components. However, combinatorial problems relating to strong components can 
become unwieldy. We will avoid these difficulties by focusing our attention on directed cycles 
rather than strong components. We will see that in this random setting, strong components 
tend to be cycles. 

Notice that for any strong component Cp(x) there is a unique subgraph of Dp that contains 
the alleles of Cf(x) along with all the edges between them that are present in Dp- Thus, there 
should be no confusion if we refer to these subgraphs as strong components as well. With this 
graphical notion of a strong component in mind, we notice that if x and y are distinct alleles 
in the same strong component of Dp, then there is a closed directed walk in Dp containing x 
and y. When a directed walk of length i is allele disjoint, i.e., has i edges and i alleles, then we 
say that it is a simple cycle of length i, or more succinctly, an i-cycle. We refer to a closed walk 
that contains one or more repeated alleles as a compound cycle. 

Next, for any set of alleles M, we define 

M = {y :y G M}. 

When M n M = 0, we say that M and M are complementary. Notice that if y G Cf(x), then 
y G Cf(x) so Cf(x) = Cf(x) for every allele x. Moreover, 

C F (x) C\~CfJx) + <=> C f (x)=~CfJx)- 

If there is an allele x such that Cf(x) = Cf(x), then there are alleles y, z ^ {x, x} (not necessarily 
distinct) such that 

x^y^x-^z^x 

and we say that x (and each y G Cf(x)) is on a contradictory cycle. This cycle may be compound 
or simple. Since the allele x depends upon its complement x for survival and vice versa, neither 



3 THE NUMBER OF CLUSTERS 



5 



allele can be assigned at the corresponding locus and there are no viable genotypes. On the 
other hand, it can be shown that if there are no contradictory cycles, then there is at least one 
viable genotype (see [5] for example). Thus, a formula F allows viable genotypes in Qp if and 
only if there is no contradictory cycle in the associated digraph Dp. 

3 The number of clusters 
3.1 A pair of deterministic results 

In this subsection we consider the relationship between a fixed formula F and the structure of 
Qf- Whenever a formula F is satisfiable, there is no allele x such that Cp{x) = Cp(x), which 
implies that for a digraph corresponding to a viable genotype space, strong components come 
in complementary pairs (Cf(x),Cf(x)). 

These pairs are important for our model of the cube as a genotype space because they 
represent alternate strategies for viability. If the members of a pair are related in the partial 
order on strong components then only the strategy corresponding to the greater component is 
viable because the alleles in the lesser component require their complements for survival, which 
is not allowed. On the other hand, we will see that if there is no order relation between members 
of a nontrivial pair, then the pair splits the viable genotypes into disconnected clusters. 

Consider the following example which is illustrated in figure 13.11 We begin with a set of 
incompatibilities, F\ = {(O2, 13), (Oi, I2)}. We see that all viable genotypes are connected in 
Qf± and that Dp x has only trivial strong components. We then add an incompatibility to form 
F2 = F\ U (li,02). In this case, there are two clusters of viable genotypes in Qp 2 and a nontrivial 
strong component pair in Dp 2 that is unrelated in the partial order on strong components. By 
the addition of another incompatibility we get F3 = F% U (Oi, O2), where there is an order relation 
between the strong component pair; C_p 3 (0i) < Cp 3 (li) and every viable genotype in Qf 3 has 
all the alleles li and I2 which comprise Cp 3 (li)- 

This motivates the following definition: a splitting pair is defined to be a pair of distinct, 
nontrivial strong components (C, C) in Dp that are unrelated in the order on strong components. 
Furthermore, we say that a set S of loci contains the splitting pair (C, C) if each allele x € C 
corresponds to a locus in S. 

Lemma 2. Suppose F is a satisfiable 2- formula and that u and v are any two viable genotypes 
in Qp, and let S be the set of loci on which u and v differ. Then u and v are connected in Qp 
if and only if S contains no splitting pairs in Dp. 

Proof. Suppose S contains a splitting pair (C, C) in Dp, on k > 1 loci. Consider the subcubes 
of Q n which have alleles fixed to C and C respectively. The Hamming distance between these 
subcubes is k > 1 and there are no viable genotypes off these subcubes. Since u and v are in 
different subcubes, they are not connected by a viable mutational path and we see that splitting 
pairs are sensibly defined. 

On the other hand, suppose S contains no splitting pairs in Dp. We must show that there 
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Figure 1: An example for n = 4 where the dimensions on the cube are ordered left /right, 
down/up, front/back, and in/out. The inviable (shaded) genotypes are left to give perspective 
and to demonstrate the way incompatibilities eliminate subcubes (shaded). Notice the pro- 
gression from F\ = {(O2, 13), (Oi, I2)} where there are no cycles on strictly distinct alleles, to 
F2 = F\ U (li,02)} where there is a pair of complementary cycles, but no order between the 
cycles, to F3 = F2 U (01,62)} where there is an order relation between the pair of cycles. 
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is a path between u and v in Qp. We think of each step toward v as crossing out an element 
of S. Let x and x be alleles of u and v respectively at an arbitrary locus in S. We construct a 
path starting from u and ending in a genotype w with allele x, such that each step decreases the 
distance to v, and such that each genotype on the path is viable. This is all that must be shown 
because each step crosses out an element of S where \S\ < oc and the locus of x was arbitrary 
in S. 

Suppose x is an allele which has no outgoing edges in Dp. Then there are no incompatibilities 
involving x and a mutation from a viable genotype with allele x to the neighboring genotype 
with allele x will be a mutation between viable genotypes. More generally, the one step mutation 
from x to x will not make a viable genotype inviable if and only if each of the n — 1 shared alleles 
y is such that y L + {x). Perhaps it is more intuitive to say that any allele in L + (x) other than 
x is an allele that is shared by both genotypes. Since we are assuming that v is viable with the 
allele x, we know that any allele y £ L + (x) is an allele of v. What must be shown is that we can 
follow a path of one step mutations between viable genotypes, which begins at u and picks up 
all the alleles except x in L + (x) which u does not already contain. Then we will be able to make 
a one step mutation from x to x that is between viable genotypes. We equate these steps in Qp 
with the removal of an allele from L + {x). Thus, we wish to remove all the alleles of L + (x). 

Notice that the alleles of L + (x) are strictly distinct because v is viable. Also, notice that if 
u has any allele y G L + (x), then u has all the alleles of L + (y), else u is inviable. Thus, all the 
alleles in the out-graphs of the alleles common to u and v are common to u and v, so we need 
not make any of these mutations. Consequently, if we remove D + (y) from D + {x) for each allele 
y common to u and v, the remaining trimmed out subgraph of D + (x) will still be connected and 
any steps that we wish to make will concern only alleles in the trimmed out subgraph. Thus, 
we can assume that every allele in L + {x) that is common to u and v has already been removed. 

Suppose there is a nontrivial strong component C C L + (x). If there is an order relation 
between C and C, then C < C because v is viable; but so too is u, so the alleles of C are 
common to u and v and have already been removed from L + (x). Likewise, if there is no order 
relation between C and C, then (C, C) is a splitting pair; so u and v share the alleles of C by 
assumption. Consequently, we have shown that once the common alleles of u and v have been 
removed from L + (x), the trimmed out subgraph of D + (x) that remains is a directed tree. This 
means that only one step mutations are necessary to get from u to a genotype with the alleles 
of L+(x). 

To be precise, consider the leaves of the trimmed out subgraph. Any leaf y is free of incom- 
patibilities, so a mutation from a viable genotype with y to a genotype with y will result in a 
viable genotype. Thus, by the viability of u, we see that the entire subcube of Q n formed by 
varying the loci of these leaves from u is viable. Let u* be the genotype that results from making 
all these mutations. As we walk from u to u* we trim the leaves and edges ending in those leaves 
from D + {x). The leaves of this new trimmed graph correspond to loci which can be varied from 
u* to form another viable subcube. Since is finite, we can continue trimming the leaves 

until we are left with x alone. At this point we will be free to follow the mutation from x to x 
in Qp and the path from u to w is complete. 

□ 
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Lemma [2] does not address the existence of viable genotypes following the two possible 
strategies implied by each splitting pair, and we turn to this issue next. As above, suppose F is 
satisfiable and that (C, C) is a splitting pair. Since F is satisfiable, there is a viable genotype 
following at least one of the strategies C or C, say u is following C. We know that there is no 
z£C such that z ~> z, else (C, C) is not a splitting pair, so A = \J z£C L + (z) is a set of strictly 
distinct alleles and there are no incompatibilities among the pairs in A. Furthermore, none of 
the alleles of u are incompatible. Thus, if there are not any incompatibilities (x, y) such that x 
is an allele of A while y is an allele of u which is strictly distinct from the alleles of A then the 
genotype with all the alleles of A and alleles of u assigned on the remaining loci will be viable. 
Suppose there were such an incompatibility. Then x y so y £ A, but then y is not strictly 
distinct from the alleles of A so we have a contradiction. 

Thus, for every pair of potential strategies, there are viable genotypes with each stategy and 
we have the following 

Corollary 3. // F is satisfiable and there are k splitting pairs corresponding to F , then the 
number of clusters of viable genotypes in Qp is 2 k . 



3.2 Proof of the theorem 



Let Y n be the random variable that counts the number of splitting pairs in Dp. We will show 
that Y n converges weakly to the Poisson random variable \& with mean 

A = -l(ln(l-c) + c). 

We begin by discussing i-cycles. Let Ti^ n to be the set of complementary i-cycle pairs in Dp 
and define 

n 

r n = \Jr 

For each a £ T n let 



2 



1 if a G Dp 
else. 

Then 

Xi t n — ^ ^ la 

is the random variable that counts the number of i-cycle pairs in Dp and 

rt 

i=2 

counts the total number of simple cycle pairs in Dp. 

Notice that the number of directed i-cycles on strictly distinct alleles is 

i I i 
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where (n)j = n(n — 1) . . . (n — (i — 1)). For each such cycle 

a = y\ -> ^2 -> • • • -> y* -> yi 
that is present in Z?p, the complementary cycle 

a = yi <- j/2 <- • • • <- yi <- yi 

is also in Dp. Thus, the expected number of i-cycle pairs is 

^ n = E (X i>n ) =Y.P l = ^2*" V |- 
If c < 1, then we also have that the expected total number of simple cycles is 

n 

A n = £(X n ) = ^ )n ^--(ln(l-c)+c). 

i=2 

This immediately implies that if c < 1 and uj = oj(n) — > oo, then 

P ( £*i,n > j < £ (f^n) °' 

\i=ti) / \i=u) J 

which suggests that truncated random variables should be enough for weak convergence. Thus, 
we define 



T m ,n — ^ ^ Xi 



i=2 

One way to prove the weak convergence of Y n to is to show that if A C Z + is any subset 
of the positive integers and e > is any constant, then there is an n such that 

\P(Y n G A) - P(* G A)\ <e. 

Let \ m . n be the expectation of T m , n and ^ m ,n be the Poisson random variable with mean 
A m ,n- Then the difference above can be bounded above by 

\P(Y n G A) - P{T m:n G A)\ (2) 
+ \P(T m , n G A) - P(* m , n G A)\ (3) 
+ \P(* m , n €A)-P(*€A)\. (4) 

Thus, to prove the weak convergence of Y n to it is enough to show that for every e > there 
is an m and an n which may depend on m, such that (2), (3) and (4) are each less than e/3. 

We can certainly find m and n for (4) so we focus on (2) and (3). Notice that (2) is bounded 
above by P(Y n ^ T m , n ), which is in turn bounded above by the expected number of events which 
distinguish Y n from T m ^ n . To understand these distinguishing events, notice that if there are no 
i-cycles for i > m, no compound cycles of any length, and no paths between any complementary 
sets of alleles of size at most m, then Y n = T m>n . Thus, the lemma below provides the desired 
bound for (2). 
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Lemma 4. Let p = c/(2n) and c < 1. let m < oo be fixed and e m = X^m+i f^- Then 

(i) The expected number of complementary simple cycle pairs in Dp of length greater than m 
is less than e m - 



(ii) The expected number of compound cycles of Dp on strictly distinct alleles is 0(n 

(Hi) The expected number of complementary simple cycle pairs in Dp that are comparable in 
the order relation on strong components is 0(n~ l ). 

Proof. Item (i) follows from ([I]) above. 

Item (ii) follows from the observation that any compound cycle contains at least a simple 
cycle a and a path (3 with first and last alleles in a. To see this, suppose a E I\ )fl and f3 is a path 
with j > edges and both endpoints in a. Then the expected number of such configurations in 
Dp is bounded above by 



"]2 i - 1 (i - ~ l)!«V +j < i(2n) i+J '-y +j = c i+j i/(2n). 



Summing over 1 < j < n — i and then over 2 < i < n gives the desired result. 

For (Hi), notice that the expected number of complementary simple cycle pairs that are 
comparable in the order relation on strong components is exactly the same as the expected 
number of cycle and path pairs described above. Indeed, in either case, there are i(i — 1) paths 
of length one and i 2 choices for the endpoints of longer paths and exactly the same alleles to 
choose from for intermediary alleles on the longer paths. □ 

The supremum of (3) is half the total variation between the law of T m ^ n and ^m,n- Thus, 
the bound for (3) follows from 

Lemma 5. Let m < oo be fixed. Then the total variation between the law of T m ^ n and Poisson 
random variable with mean \ min is 0(n _1 ). 

Proof. Let T a be the set of all complementary cycle pairs that are distinct from a and yet not 
independent of a. Also, let E(I a ) = p a . The local Chen-Stein method [2] gives the following 
upper bound on the total variation between the law of T mj „, and ^f m . n - 

min{l, \-) m } ^ p 2 a + (PaP/3 + E(I a I(3)) . 

\aer n aer n ,/3er Q / 

Since A„ im is bounded below we need only show that each of the sums above is 0(n _1 ). 
Notice also that, for each fixed 2 < i < n, [ii^ n is bounded below in n and that the sums above 
strictly contain the related sum for any Xi^ n . Thus, if the sum above is 0(n _1 ), then so too is 
the corresponding sum for each Xi^ n . Consequently, the fact that Xi )U converges weakly to the 
Poisson random variable with mean c l /(2i) follows from the proof given here. 
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Consider the first sum 



n / \ 00 1 



Qgr n 1 = 2 V ' 1=2 

Since this sum is 0(n -2 ), we can address the second sum. If I a and Ip are not independent, 
then they share some edge and the event that I a = 1 increases the probability that Ip = 1, so 

E(I a Ip) > p a pp 

and we need only show convergence of the sum 

£ ^ J ^)- ( 5 ) 

aer n ,/3€F a 

Let a £ Tj and let (3 be such that the edges of (3 which are shared with those of a are found 
on j disjoint paths in each of its members and such that there are k alleles on each member of 
(3 which are not on the paths common to both a and f3. Then, the probability that any such 
fixed pair (a, (3) is present in Dp is p l+ i +k . 

We will count the number of such pairs by first examining the number of possible shared 
paths. Notice first that 1 < j < \i/2\ and by choosing 2j alleles in a we count the number of 
possible endpoints of the shared paths in one of the cycles of a. Between any two endpoints 
that have no other endpoints between them, the edges connecting them are either a shared path 
or not, but once this choice is made for one such pair, it is made for all the endpoints. Thus, 
the number of possible shared paths is 



2jJ 2ij!2i-i(j-l)! ./! J V.jV 

The k alleles on each member of (3 could be on a or not and the ordering of these alleles is 
interspersed with the the ordering of the shared paths. There are less than (2n) k orderings 
for the k alleles and at most j! orderings allowed for the shared paths. Once these have been 
determined, we must choose where the j ordered paths are placed among the k ordered alleles. 
If there were no restrictions on interspersal, then this would be equivalent to forming weak 
compositions of k into j parts. Thus, we get the following upper bound on the number of ways 
to finish (3 once a and the shared paths have been determined: 



(2n) k j\ 



k + j-1 
3-1 



Hence, for fixed i, j and k, an upper bound on the expected number of such ordered pairs (a, (3) 
in Dp is 
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The sum over the terms involving k can be bounded above by the sum over all k > 0: 



E 

fc=0 



1 - c 



The sum over j is bounded as follows 

Li/2J 



E 



;i - c)n 



U/2J 

E 



i - 1 
j-1 



[1 — c)n 



< 



cr 



fl — c)n I ' \ i 



a 



(1 — c)n 



4(1 - c) 



(1 — c)n 



Using the bounds 



1 + x < e x and ^ = TT ( 1 - < e"£ 

- LJ - V n 

z=i v 



we can get an upper bound for the sum over 2 < i < m 

P 



— Y 



ic exp 



i=2 



-1 + 



[l-c)J 2n 



Since m is fixed, this sum is 0{n 1 ), which completes the proof of the lemma. 



(6) 



□ 



This also proves that Y n converges weakly to \I/ and consequently that the probability that 
there is no splitting pair converges to e~ A . Moreover, since the formula is satisfiable a.a.s., 
N n and 2 Yn both converge weakly to iV = 2*. Thus, we have completed the proof of the theorem. 



3.3 Discussion 

We find the structure of this fitness landscape to be interesting because, while the number of 
viable genotypes grows exponentially with the number of loci, the number of clusters of these 
genotypes is relatively small and may be finite in expectation. Computer simulations have been 
run [12] that suggest this is the case, but we have not been able to determine whether or not 
the expectation of N n converges. This appears to present a challenge because one must consider 
correlation between cycles in Dp of unbounded length. For a discussion of correlation between 
undirected cycles of bounded length, see [1]; or for a result concerning directed cycles, see [T4"] . 
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